infrared image
SGDFuse: SAM-Guided Diffusion for High-Fidelity Infrared and Visible Image Fusion
Zhang, Xiaoyang, Li, jinjiang, Fan, Guodong, Ju, Yakun, Fan, Linwei, Liu, Jun, Kot, Alex C.
Infrared and visible image fusion (IVIF) aims to combine the thermal radiation information from infrared images with the rich texture details from visible images to enhance perceptual capabilities for downstream visual tasks. However, existing methods often fail to preserve key targets due to a lack of deep semantic understanding of the scene, while the fusion process itself can also introduce artifacts and detail loss, severely compromising both image quality and task performance. To address these issues, this paper proposes SGDFuse, a conditional diffusion model guided by the Segment Anything Model (SAM), to achieve high-fidelity and semantically-aware image fusion. The core of our method is to utilize high-quality semantic masks generated by SAM as explicit priors to guide the optimization of the fusion process via a conditional diffusion model. Specifically, the framework operates in a two-stage process: it first performs a preliminary fusion of multi-modal features, and then utilizes the semantic masks from SAM jointly with the preliminary fused image as a condition to drive the diffusion model's coarse-to-fine denoising generation. This ensures the fusion process not only has explicit semantic directionality but also guarantees the high fidelity of the final result. Extensive experiments demonstrate that SGDFuse achieves state-of-the-art performance in both subjective and objective evaluations, as well as in its adaptability to downstream tasks, providing a powerful solution to the core challenges in image fusion. The code of SGDFuse is available at https://github.com/boshizhang123/SGDFuse.
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Asia > China > Shandong Province > Yantai (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (2 more...)
- Asia > China > Shaanxi Province (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.34)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Information Technology (0.67)
- Law (0.67)
Inference-Time Scaling of Diffusion Models for Infrared Data Generation
Horstmann, Kai A., Clouser, Maxim, Khezeli, Kia
Infrared imagery enables temperature-based scene understanding using passive sensors, particularly under conditions of low visibility where traditional RGB imaging fails. Yet, developing downstream vision models for infrared applications is hindered by the scarcity of high-quality annotated data, due to the specialized expertise required for infrared annotation. While synthetic infrared image generation has the potential to accelerate model development by providing large-scale, diverse training data, training foundation-level generative diffusion models in the infrared domain has remained elusive due to limited datasets. In light of such data constraints, we explore an inference-time scaling approach using a domain-adapted CLIP-based verifier for enhanced infrared image generation quality. We adapt FLUX.1-dev, a state-of-the-art text-to-image diffusion model, to the infrared domain by finetuning it on a small sample of infrared images using parameter-efficient techniques. The trained verifier is then employed during inference to guide the diffusion sampling process toward higher quality infrared generations that better align with input text prompts. Empirically, we find that our approach leads to consistent improvements in generation quality, reducing FID scores on the KAIST Multispectral Pedestrian Detection Benchmark dataset by 10% compared to unguided baseline samples. Our results suggest that inference-time guidance offers a promising direction for bridging the domain gap in low-data infrared settings.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Dominican Republic (0.04)
- Transportation > Ground > Road (0.35)
- Automobiles & Trucks (0.35)
- Asia > China > Shaanxi Province (0.04)
- Asia > China > Chongqing Province > Chongqing (0.04)
- Information Technology > Security & Privacy (1.00)
- Transportation > Ground > Road (0.34)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Information Technology (0.67)
- Law (0.67)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Deep Learning Methods for Detecting Thermal Runaway Events in Battery Production Lines
Athanasopoulos, Athanasios, Mihalák, Matúš, Pietrasik, Marcin
One of the key safety considerations of battery manufacturing is thermal runaway, the uncontrolled increase in temperature which can lead to fires, explosions, and emissions of toxic gasses. As such, development of automated systems capable of detecting such events is of considerable importance in both academic and industrial contexts. In this work, we investigate the use of deep learning for detecting thermal runaway in the battery production line of VDL Nedcar, a Dutch automobile manufacturer. Specifically, we collect data from the production line to represent both baseline (non thermal runaway) and thermal runaway conditions. Thermal runaway was simulated through the use of external heat and smoke sources. The data consisted of both optical and thermal images which were then preprocessed and fused before serving as input to our models. In this regard, we evaluated three deep-learning models widely used in computer vision including shallow convolutional neural networks, residual neural networks, and vision transformers on two performance metrics. Furthermore, we evaluated these models using explainability methods to gain insight into their ability to capture the relevant feature information from their inputs. The obtained results indicate that the use of deep learning is a viable approach to thermal runaway detection in battery production lines.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Netherlands > Limburg > Maastricht (0.04)
- Europe > Germany (0.04)
- (2 more...)
- Research Report (0.82)
- Overview (0.68)
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)
- Automobiles & Trucks (1.00)
- Transportation > Ground > Road (0.34)
DFVO: Learning Darkness-free Visible and Infrared Image Disentanglement and Fusion All at Once
Zhou, Qi, Shi, Yukai, Yang, Xiaojun, Xian, Xiaoyu, Liao, Lunjia, Zhang, Ruimao, Lin, Liang
Visible and infrared image fusion is one of the most crucial tasks in the field of image fusion, aiming to generate fused images with clear structural information and high-quality texture features for high-level vision tasks. However, when faced with severe illumination degradation in visible images, the fusion results of existing image fusion methods often exhibit blurry and dim visual effects, posing major challenges for autonomous driving. To this end, a Darkness-Free network is proposed to handle Visible and infrared image disentanglement and fusion all at Once (DFVO), which employs a cascaded multi-task approach to replace the traditional two-stage cascaded training (enhancement and fusion), addressing the issue of information entropy loss caused by hierarchical data transmission. Specifically, we construct a latent-common feature extractor (LCFE) to obtain latent features for the cascaded tasks strategy. Firstly, a details-extraction module (DEM) is devised to acquire high-frequency semantic information. Secondly, we design a hyper cross-attention module (HCAM) to extract low-frequency information and preserve texture features from source images. Finally, a relevant loss function is designed to guide the holistic network learning, thereby achieving better image fusion. Extensive experiments demonstrate that our proposed approach outperforms state-of-the-art alternatives in terms of qualitative and quantitative evaluations. Particularly, DFVO can generate clearer, more informative, and more evenly illuminated fusion results in the dark environments, achieving best performance on the LLVIP dataset with 63.258 dB PSNR and 0.724 CC, providing more effective information for high-level vision tasks. Our code is publicly accessible at https://github.com/DaVin-Qi530/DFVO.
Spectral Enhancement and Pseudo-Anchor Guidance for Infrared-Visible Person Re-Identification
Ge, Yiyuan, Chen, Zhihao, Wang, Ziyang, Kang, Jiaju, Zhang, Mingya
The development of deep learning has facilitated the application of person re-identification (ReID) technology in intelligent security. Visible-infrared person re-identification (VI-ReID) aims to match pedestrians across infrared and visible modality images enabling 24-hour surveillance. Current studies relying on unsupervised modality transformations as well as inefficient embedding constraints to bridge the spectral differences between infrared and visible images, however, limit their potential performance. To tackle the limitations of the above approaches, this paper introduces a simple yet effective Spectral Enhancement and Pseudo-anchor Guidance Network, named SEPG-Net. Specifically, we propose a more homogeneous spectral enhancement scheme based on frequency domain information and greyscale space, which avoids the information loss typically caused by inefficient modality transformations. Further, a Pseudo Anchor-guided Bidirectional Aggregation (PABA) loss is introduced to bridge local modality discrepancies while better preserving discriminative identity embeddings. Experimental results on two public benchmark datasets demonstrate the superior performance of SEPG-Net against other state-of-the-art methods. The code is available at https://github.com/1024AILab/ReID-SEPG.
- Asia > Singapore (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (2 more...)
TeX-NeRF: Neural Radiance Fields from Pseudo-TeX Vision
Neural radiance fields (NeRF) has gained significant attention for its exceptional visual effects. However, most existing NeRF methods reconstruct 3D scenes from RGB images captured by visible light cameras. In practical scenarios like darkness, low light, or bad weather, visible light cameras become ineffective. Therefore, we propose TeX-NeRF, a 3D reconstruction method using only infrared images, which introduces the object material emissivity as a priori, preprocesses the infrared images using Pseudo-TeX vision, and maps the temperatures (T), emissivities (e), and textures (X) of the scene into the saturation (S), hue (H), and value (V) channels of the HSV color space, respectively. Novel view synthesis using the processed images has yielded excellent results. Additionally, we introduce 3D-TeX Datasets, the first dataset comprising infrared images and their corresponding Pseudo-TeX vision images. Experiments demonstrate that our method not only matches the quality of scene reconstruction achieved with high-quality RGB images but also provides accurate temperature estimations for objects in the scene.
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)